Summarizing physical performance in professional soccer: development of a new composite index

The aims of this study were to create a composite index to measure the overall players’ physical performance in professional soccer matches and analyze the effect of individual playing time and positional differences on this composite index. A total of 830 official matches from LaLiga men’s first division and Spanish Copa del Rey were analyzed, which resulted in 24,980 match observations collected from 1138 male players (forwards, n = 286; midfielders, n = 441; defenders, n = 411). The physical performance variables, which represent the locomotor demands, were collected using electronic performance tracking systems. A Partial Least-Squares Structural Equation Model (PLS-SEM) was used to measure performance. The PLS-SEM output had three significant latent components, which explained 95% of the initial variability, that were related to the acceleration-specific performance (component 1), high-intensity running-related variables (component 2), and medium intensity actions variables (component 3). Also, a linear regression analysis was used to explore relationships between playing activity time (hours—X axis) and the composite index (10-point scale—Y axis), in which a strong and positive correlation was observed between individual playing time and the composite index (r = 0.76; p < 0.001; R2 = 0.58). Also, significant positive correlations were observed in forwards (r = 0.85; p < 0.001; R2 = 0.74), midfielders (r = 0.80; p < 0.001; R2 = 0.64), and defenders (r = 0.67; p < 0.001; R2 = 0.45). However, significant differences between playing positions with a small effect size (p < 0.05; eta-squared = 0.01) were found. From a practical perspective, this study may serve as a reference for sports performance practitioners to create a composite index that measures the overall players’ physical performance. The instructions to create this index are available in the manuscript.

Soccer stands as the most widely embraced sport across the globe, engaging millions of participants worldwide, although only a select few transition to the professional level 1 .This team sport is typified by an intermittent activity pattern, involving high-intensity anaerobic actions interspersed with periods of reduced intensity 2,3 .Soccer matches impose the most significant load in a single session within the weekly training regimen and coaches typically take this into account when structuring the weekly training plan 4 .Consequently, it's imperative for sports performance and medical professionals to assess physical performance to ensure that players are adequately prepared for the demands of competition 5 .
In recent years, the adoption of electronic performance and tracking systems (EPTS) has allowed professionals to gain a deeper understanding of the physical demands in soccer 6,7 .This data can be collected for both matches and training sessions 8 .These tracking systems encompass camera-based technologies and wearable devices, incorporating a mix of positioning systems (e.g., Global Positioning Systems or Local Positioning Systems), inertial measurement units (e.g., accelerometers, gyroscopes, and magnetometers), and physiological monitors (e.g., heart rate monitors) 6,7 .Specifically, EPTS allow the gathering of external load (e.g., distance covered, accelerations, decelerations, or sprints) and internal load (e.g., psychophysiological responses like mean or peak heart rate) data 2,6,7 .Therefore, these tools are invaluable to professionals since the information obtained from monitoring players during competition informs decisions not only related to the training schedule and but also during a session as well (e.g., live performance data) 9 .
Nonetheless, managing data and interpreting variables pose significant challenges for sports performance and medical experts using EPTS 10,11 .Practitioners often receive daily physical performance reports containing approximately from 100 to 200 variables 10 .As a result, there is a need to condense these large datasets, requiring practitioners to employ appropriate methods to pinpoint and select key performance indicators after each session's data collection 10,11 .Considering that one of the major hurdles for sports performance and medical practitioners using EPTS is the handling of data and the interpretation of numerous variables 2,11 , the summarization of extensive data is necessary.Professionals must apply appropriate methods to pinpoint and select key performance indicators once data has been collected in each session 2,11 .In this regard, resent research suggested the use of Partial Least-Squares Structural Equation Model (PLS-SEM) to measure performance 12,13 .Utilizing freely accessible key performance indicators sourced from sofifa.com,previous research developed composite indicators for mobile players in the top 5 European Leagues using a Third Order PLS-SEM model, albeit without considering physical performance 12 .PLS-SEM, overall, provides a flexible (i.e., being a non-parametric tool) and robust method for analyzing composite indicators, rendering it a valuable resource for researchers in diverse domains such as social sciences, management, and economics 12 .
However, when measuring performance using PLS-SEM, the effect of playing time and playing position should be taken into account.The reason is that differences in playing time between matches (e.g., differences in match duration due to extra time or 30-min overtime periods) 14,15 , players' participation in match-play (e.g., starters vs non-starters) 16,17 , and physical demands by playing position 18,19 may be observed.
Therefore, the aims of this study were to (1) create a composite index to measure the overall players' physical performance in professional soccer matches and ( 2) analyze the effect of individual playing time and positional differences on the new index.The hypotheses were that (1) the PLS-SEM could create a composite index to measure the overall players' physical performance, but (2) individual playing time and position would be influential features due to significant correlations between time and performance as well as differences between playing positions.

Study design
This is an observational and retrospective study which includes a total of 830 official matches that were analyzed.Data were collected from 42 professional soccer teams participating in LaLiga 2021-22 Men's First Division and Spanish Copa del Rey.The physical performance variables were collected using EPTS.Each match had a duration of 90 min plus additional time.Six matches included an overtime period of 30 min as they were part of Copa del Rey.

Participants
A total of 24,980 match observations were collected from 42 teams, including 1138 male professional soccer players.Each player was categorized based on the following positions: forwards (n = 286), midfielders (n = 441), and defenders (n = 411).The players were included in the study if they participated either in the Spanish Men's First Division League or Spanish Copa del Rey in the 2021/2022 season.All players' performance data were considered for this study (including any substitutions and the extra-time period from Copa del Rey matches).Due to the different nature of their activity profile, goalkeepers were not included in the study.All the information was sourced from LaLiga, which permitted the examination of variables investigated in this study and the dissemination of results with a scientific aim.Adhering to LaLiga's ethical standards, this study abstains from disclosing any data that could identify individual soccer players.We confirm that all methods were carried out in accordance with relevant guidelines and regulations; in particular, all experimental protocols were approved by LaLiga (www.laliga.com); subsequently, informed consent was obtained from all subjects.LaLiga granted permission for the use of these data in this investigation, which received approval from the Institutional Review Board.

Procedures
Performance data were gathered using the computerized multi-camera tracking system TRACAB Gen4 (ChyronHego, New York, USA), which is a recognized technology for soccer-specific performance analysis 20 .TRACAB's tracking systems are deemed valid technologies for soccer-specific performance analyses 20 .This system recorded positioning and motion data through a computerized multi-camera approach.Subsequently, a customized report was generated with the assistance of Mediacoach software (www.media coach.es, LaLiga, Madrid, Spain).This software synchronized the tracking data with video footage of each match.Also, to ensure data accuracy, a quality control process was implemented by Mediacoach after each match.This process involved cross-referencing the TRACAB data with TRACAB's own algorithm and conducting a player-by-player review to rectify any potential errors inherent in the optical tracking technology.This meticulous quality control procedure not only enhances the quality of the data but also enables professionals to visualize and analyze the performance tracking data, as outlined in the work of Refs. 21,22.
Secondly, a Principal Component Analysis (PCA) 23 was conducted as explorative analysis and for detecting some latent factors behind the physical performance composite index and select the most related variables.Specifically, three significant latent components were found (eigenvalues greater than 1); those explained at about the 95% of the initial variability.At this point, for each latent component the most important (i.e., with loading factor |λ| > 0.65) variables were selected after a varimax rotation.Then, 17 variables, which were strictly related to their own components (Table 1), were selected.
Then, we decided to adopt a non-parametric approach for creating the composite indicator, the PLS-SEM approach, a trend-method many used in social sciences 15 .In particular, a hierarchical (Second Order) PLS-SEM algorithm was completed using the smartPLS software (www.smart pls.com, version 3.3.7)and the R package seminR (version 2.3.2, with 5000 bootstrap resampling) 24 by a Mixed Two-Step approach 25 to estimate the Second Order construct 12 .Finally, as player index adjusment phase, a normalization process was made for each player their own indices time series and then the index was translated in a clearer evaluation scale (between 0 and 10), taking into account 130 min as maximum target for the players' physical performance (e.g., considering matches with 30 min overtime and the extra time).A detailed procedure of the formulas applied at this stage were provided in the results section given the nature of this study.
Once the composite index was created, a linear regression analysis was carried out to explore relationships between playing activity time and the composite index.In addition, the Kruskal-Wallis test was conducted to analyze the differences in the composite index between playing positions.Effect sizes were calculated through eta squared.A larger eta squared value indicated a stronger effect of the independent variable(s) on the dependent www.nature.com/scientificreports/variable while a value closer to 0 suggested a smaller effect size.Specifically, the effect sizes were interpreted as follows: small effect size (Eta squared ≤ 0.01), medium effect size (0.01 < Eta-squared ≤ 0.06), and large effect size (Eta-squared > 0.06) 26 .

Composite index
Figure 1 shows the PLS-SEM output.Specifically, three significant latent components were found, which explained 95% of the initial variability and were related to the acceleration-specific performance (component 1), high-intensity running-related variables (component 2), and medium intensity actions (component 3).From a practical point of view, to compute the composite index for a generic player i and given its set of 17 physical performance variables, the following process needs to be followed (based on the weights w of Fig. 1):*** 1) First of all, for each player i the composite index needs to be computed for each lower order component using formula (1-3): www.nature.com/scientificreports/2) Then, to compute the raw composite index rough for each player, apply formula (4): 3) Finally, to obtain the normalized composite index in a 10-point scale (0 being the lowest performance and 10 being the highest) for an easy interpretation of the index, formula (5) should be used for each player i.
where max(Raw composite index i ) and min(Raw composite index i ) are respectively the minumum and maximum raw composite index of the series values (i.e., considering all the players).

Discussion
The purpose of this study was to create a composite index to measure the overall players' physical performance in professional soccer matches and analyze the effect of individual playing time and positional differences on this composite index.The novelty of this study was that this method could reduce all the information collected in the physical performance report to one single variable.Specifically, three significant latent components were found, which explained 95% of the initial variability, that were related to the acceleration-specific performance (component 1), high-intensity running-related variables (component 2), and medium intensity actions variables (component 3).Also, a strong and positive correlation was observed between individual playing time and the composite index, but positional differences may be observed.
The physical performance composite index which was created in this study is a very novel approach for the assessment of physical performance.This computational approach, which is based on the PLS-SEM hierarchical model, is an original application in the sport field 12 and has never been applied to professional soccer players performance data, to the best of our knowledge.Previous studies have applied other statistical methods to reduce the number of physical performance variables from their reports 2,27 .For instance, a recent study explained that 7 variables, which included: metabolic power, total of steps, Fourier transform duration, deceleration distance covered (2-3 m/s 2 ), total of running actions (12-18 km/h; 21-24 km/h) were the selected variables, belonged to the first two components of the PCA and explained 80% of total variance 2 .In this regard, another study found three components in the PCA that represented the ~ 59% of total variance (component 1: distance per minute, explosive distance, distance per minute in zones like 18-21 km/h and 21-24 km/h; component 2: accelerations and decelerations; component 3: maximum acceleration and deceleration) 27 .In this regard, there is a level of similarity between the types of variables that were found as important parameters to analyze (e.g., mid-intensity and high-intensity running actions: average speed or meters per minute, and actions above 18, 21, or 24 km/h; www.nature.com/scientificreports/variables with accelerations and deceleration component: explosive distance, total of accelerations/decelerations and considering different speed bands) and they may explain the importance of understanding soccer as a sport characterized by high-intensity actions interspersed with longer recovery periods of lower intensity 2,28 .Furthermore, strong, and positive correlations were observed between individual playing time and the composite index, but positional differences may be observed.The fact that there is a positive linear relation in relation to time was expected because of the natural evolution of the match demands as the players stay on the field 18 .However, this was important to analyze to gain a better understanding of how the physical performance composite index that was created would change throughout the course of match-play.In addition, positional differences were observed and these were in line with the initial hypothesis.Multiple studies have shown that physical performance is dependent not only on playing position, but also on other contextual factors (e.g., team formation, ball in play, competitive standards, match status, etc.) [29][30][31] , so future research could be done in order to gain a better understanding of how these contextual variables impact the physical performance composite index.
However, this study has some limitations.For example, the physical performance data were collected from video-tracking systems so no information about the physiological response of the players (e.g., mean heart rate, time spent in different heart rate zones, etc.) was provided.Although future research is required in order to explore the applicability of these methods to pinpoint and select key performance indicators, it is necessary to ensure that data quality from the original performance reports is examined.Also, the playing positions were categorized in three groups while there could be a more extended approach based on various team formations.In this regard, future studies could consider specific positions such as central defenders, full-backs, central midfielders, wide-midfielders, and forwards 18,19 .Finally, another limitation was that only 6 matches were included in the analysis of matches with overtime periods, which is imbalanced in comparison with the total of match observations from regular 90-min matches.

Practical implications
This study may serve as a reference for sports performance practitioners to create a composite index that measures the overall players' physical performance, so the instructions to create it are available in the manuscript.In addition, this composite index may be used for correlation with technical-tactical parameters, which may be an opportunity to understand the weight of physical output on individual and/or team performance.Future research is necessary to have a better understanding of the applicability of this data reduction method not only in professional soccer but in other sports.In addition, given the importance of the variables the contributed to the three main components of the PLS-SEM output, coaches may consider the variables from Table 1 the analysis of physical performance.

Table 1 .
Output of the principal component analysis.